An empirical analysis of dropout in piecewise linear networks
نویسندگان
چکیده
The recently introduced dropout training criterion for neural networks has been the subject of much attention due to its simplicity and remarkable effectiveness as a regularizer, as well as its interpretation as a training procedure for an exponentially large ensemble of networks that share parameters. In this work we empirically investigate several questions related to the efficacy of dropout, specifically as it concerns networks employing the popular rectified linear activation function. We investigate the quality of the test time weight-scaling inference procedure by evaluating the geometric average exactly in small models, as well as compare the performance of the geometric mean to the arithmetic mean more commonly employed by ensemble techniques. We explore the effect of tied weights on the ensemble interpretation by training ensembles of masked networks without tied weights. Finally, we investigate an alternative criterion based on a biased estimator of the maximum likelihood ensemble gradient.
منابع مشابه
No bad local minima: Data independent training error guarantees for multilayer neural networks
We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local mi...
متن کاملDiscontinuous Piecewise Polynomial Neural Networks
An artificial neural network is presented based on the idea of connections between units that are only active for a specific range of input values and zero outside that range (and so are not evaluated outside the active range). The connection function is represented by a polynomial with compact support. The finite range of activation allows for great activation sparsity in the network and means...
متن کاملBayesian Proportional Hazard Analysis of the Timing of High School Dropout Decisions
In this paper, I study the timing of high school dropout decisions using data from High School and Beyond. I propose a Bayesian proportional hazard analysis framework that takes into account the specification of piecewise constant baseline hazard, the time-varying covariate of dropout eligibility, and individual, school, and state level random effects in the dropout hazard. I find that students...
متن کاملApproximate Solution of Sensitivity Matrix of Required Velocity Using Piecewise Linear Gravity Assumption
In this paper, an approximate solution of sensitivity matrix of required velocity with final velocity constraint is derived using a piecewise linear gravity assumption. The total flight time is also fixed for the problem. Simulation results show the accuracy of the method. Increasing the midway points for linearization, increases the accuracy of the solution, which this, in turn, depends on the...
متن کاملDropout with Expectation-linear Regularization
Dropout, a simple and effective way to train deep neural networks, has led to a number of impressive empirical successes and spawned many recent theoretical investigations. However, the gap between dropout’s training and inference phases, introduced due to tractability considerations, has largely remained under-appreciated. In this work, we first formulate dropout as a tractable approximation o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1312.6197 شماره
صفحات -
تاریخ انتشار 2013